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Abstract 

Background: Genetic predisposition is the primary risk factor for familial breast cancer. For the majority of familial 
breast cancer, however, the genetic predispositions remain unknown. All newly identified predispositions occur 
rarely in disease population, and the unknown genetic predispositions are estimated to reach up to total 
thousands. Family unit is the basic structure of genetics. Because it is an autosomal dominant disease, individuals 
with a history of familial breast cancer must carry the same genetic predisposition across generations. Therefore, 
focusing on the cases in lineages of familial breast cancer, rather than pooled cases in disease population, is 
expected to provide high probability to identify the genetic predisposition for each family. 

Methods: In this study, we tested genetic predispositions by analyzing the family-specific variants in familial breast 
cancer. Using exome sequencing, we analyzed three families and 22 probands with BRCAx (BRCAmegative) familial 
breast cancer. 

Results: We observed the presence of family-specific, novel, deleterious germline variants in each family. Of the 
germline variants identified, many were shared between the disease-affected family members of the same family 
but not found in different families, which have their own specific variants. Certain variants are putative deleterious 
genetic predispositions damaging functionally important genes involved in DNA replication and damaging repair, 
tumor suppression, signal transduction, and phosphorylation. 

Conclusions: Our study demonstrates that the predispositions for many BRCAx familial breast cancer families can lie 
in each disease family. The application of a family-focused approach has the potential to detect many new 
predispositions. 



Background 

Breast cancer is a leading cancer in women [1]. About 
10-20% of breast cancer cases are family clustered, with 
multiple family members affected by the disease [2]. 
Genetic predispositions are the major risk factor for the 
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disease. However, the genetic predispositions are currently 
known for only 30-40% of the familial breast cancer dis- 
ease families. The remaining 60-70% of women with fa- 
milial breast cancer have unknown predispositions and 
are diagnosed with BRCAx, for their unknown predis- 
position of familial breast cancer [3]. It is estimated the 
"missing" heredity trait for BRCAx families likely consists 
of thousands of rare variants, each presenting a minor dis- 
ease risk [4]. Indeed, broadly screening the variants across 
disease populations has uncovered multiple new genetic 
predispositions for familial breast cancer. A consistent pat- 
tern among these newly classified predispositions is that 
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they are always present at very-low frequencies in the 
given disease population [5-10]. Their extreme rarity 
implies that a greater sampling size of disease popula- 
tions is required to identify the germline predispositions 
[10]. However, such an expansion is deemed to increase 
the complexity of data analysis, experimental costs, and 
time needed. As such, focusing only on the rare variants 
will not likely be able to determine the entire spectrum of 
genetic predispositions for BRCAx familial breast cancer 
families. New alternative hypotheses and approaches must 
be explored to improve the situation. For example, mosaic 
mutation has implications as potential predispositions for 
familial breast cancer [11]. 

Familial breast cancer is defined as an autosomal 
dominant genetic disease [12]. Although incidences of 
breast cancer often exhibit atypical Mendelian patterns 
due to the factors such as low penetrance of genetic 
predispositions, the predisposition in a disease-prone 
family is expected to transmit across generations and 
shared between family members. Focusing on each dis- 
ease family with a history of the disease is expected to 
improve the chance to detect the predisposition in a 
family compared to screening the disease population of 
pooled cases without family relationships, which can 
dilute the predisposition highly prevalent in a disease 
family into insignificant level. 

We hypothesize that the unknown predispositions for 
many BRCAx familial breast cancer are specific to each 
family with a history of the disease. Our previous exome 
study of a BRCAx familial breast cancer family shows the 
presence of rich genetic variants [13]. In the present study, 
we expand the exome sequencing study by analyzing three 
families with BRCAx familial breast cancer; 17 members 
had cancer, and five members were without cancer. Our 
study also includes 22 probands of BRCAx familial breast 
cancer. Our study reveals the presence of family-specific, 
novel, deleterious genetic variants as putative genetic pre- 
dispositions in each family with BRCAx familial breast 
cancer. 

Methods 

Use of human subjects 

The use of the patient samples for the study was approved 
by the Institutional Review Boards (IRB) of Creighton 
University School of Medicine (#00-12265 ) and University 
of Nebraska Medical Center (718-11-EP). All subjects 
signed the Consent to Participate Form for cancer gen- 
etic study. 

Individuals from three families with BRCAx breast 
cancer were used to generate exome sequences as we 
have previously described [13]. Family I included six 
individuals with breast cancer and two individuals 
without breast cancer. Family II included five individ- 
uals with breast cancer, one obligate carrier and two 



individuals without breast cancer. Family III included 
five individuals with breast cancer and one individual 
without breast cancer. Additionally, 22 probands for 
BRCAx familial breast cancer were included in exome 
sequencing. All cases used in the study were BRCA1- 
negative, and £i?G42-negative, 41 were female and 3 
were male, the average age is 42 years old (Figure 1, 
Table 1). 

Exome sequencing 

For each sample, exome sequencing used DNA from blood 
cells. Exome libraries were constructed using the TruSeq 
Exome Enrichment Kit (62 Mb, Illumina, San Diego, CA) 
as per manufacturers procedures. Exome sequences were 
collected with a HiSeq™ 2000 sequencer (Illumina, San 
Diego, CA) with paired-end (2 x 100). All exome data were 
deposited in the Sequence Read Archive (SRA) database in 
the National Center for Biotechnology Information (NCBI) 
(Accession numbers SAMN02404413- SAMN02404456). 

Exome sequence mapping and variant calling 

Exome sequences were mapped to the human genome 
reference sequence hgl9 by Bowtie2 with default param- 
eters in paired mode [14]. The subsequent SAM files 
were converted to BAM files. Duplicates were removed 
using Picard (http://picard.sourceforge.net). The mapped 
reads were locally realigned using the genome mapping 
tool RealignerTargetCreator from the Genome Atlas Tool 
Kit (GATK) [15]. The base quality scores were recalibrated 
using BaseRecalibrator (GATK), with NCBI dbSNP build 
137, in the GATK resource bundles for reference sequence 
hgl9. VarScan 2 was used for variant calling, [16]. VarScan 
2 was run on pileup data generated from BAM files using 
SAM tools utilities [17]. The mpileup command, with -B 
parameter to disable base alignment quality (BAQ) com- 
putation, and the default parameters were used, with the 
minimum read depth at 10 and the minimum base quality 
at 30. The called variants were annotated with ANNOVAR 
using the software-provided databases of the Reference Se- 
quence (RefSeq; NCBI), dbSNP 137, the 1000 Genomes 
Project, and the NIH Heart, Lung and Blood Institute 
(NHLBI) Exome Sequencing Project (ESP) 6500 (http:// 
evs.gs.washington.edu) . 

Those that matched in the databases were classified as 
known variants and removed. Family-specific normal 
variants were eliminated by removing the variants 
shared between the affected and the unaffected family 
members in each family. The remaining novel variants 
were classified into synonymous, non-synonymous, spli- 
cing site change, stop gain- or loss groups. The variants 
causing synonymous changes were then removed. For 
the remaining variants, PolyPhen-2 was used to identify 
variants causing deleterious effects in the affected genes 
[probably damaging score: 0.909-1; possibly damaging 
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Figure 1 Pedigrees of the three families used in the study. BC (breast cancer), Bt (brain tumor), CRC (colorectal cancer), Lu (lung cancer), 
En (endometrium cancer), Ki (kidney cancer), Lym (lymphoma), NHL (non-Hodgkin lymphoma), OC (ovarian cancer), Pro (prostate cancer). 
Sar (sarcoma), Sk (skin cancer). 



score: 0.447 - 0.908; Benign score: 0 - 0.446; HumVar 
score: 18]. The variants defined as benign were removed. 
These processes generated a list of novel, deleterious vari- 
ants only present in the cancer-affected family members 
and probands, Note that the variants in probands were 
filtered by population databases only. 



Power calculation 

Using a two-sided paired t-test and assuming a genetic 
relative risk (GRR) equal to 5.8, disease prevalence equal 
to 0.03, a disease locus frequency equal to 0.01, and a 
sib recurrence ratio of 2, a sample size of 20 achieves 
81% power to detect a mutation difference with a (stan- 
dardized) effect size of 0.67 between the affected member 
and the unaffected member. The significance level (alpha) 
is, in turn, 0.05 [19,20]. 



Validation 

Sanger sequencing was used to validate deleterious vari- 
ants. Sense and antisense PCR primers for each selected 
variant were designed using the Primer3 program. The 
original DNA samples that were used in exome sequen- 
cing were served as PCR templates. PCR amplicons were 
subjected to BigDye sequencing. The resulting sequences 
were evaluated using CLC Genomics Workbench Program 
(Cambridge, MA) to confirm the variants called from 
exome sequences. 

Results 

Mapping exome data and calling variants 

Exome sequences were collected via a blood sample 
from each study participant and mapped to the human 
genome reference sequence hgl9. Variants were called 
from the mapping data. We focused on single-base, non- 
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Table 1 BRCAx familial breast cancer cases used in the study 



Family Cancer type Pathology 



BRCA1/2 



Exome 



Reads 



Bases Bases map rate (%) Coverage Variant called 



Family 1 
1 

2 
3 
4 
5 



Family 2 
1 

2 
3 
4 
5 



Family 3 
1 

2 
3 
4 
5 
6 

Probands 
1 

2 

3 

4 

5 

6 

/ 

8 

9 

10 

11 

12 

13 

14 

15 



Breast 
Breast 
Breast 
Prostate 
No Cancer 
Breast, Colon 
Brain 
Breast 

Breast, Breast 

Obligated carrier 

Breast 

Breast 

Breast 

Breast 

No Cancer 

No Cancer 

Endometria 
Breast, Skin 
No Cancer 
Breast 



Infiltrating ductal 
Not available 
Infiltrating ductal 
Adenocarcinoma 

Adenocarcinoma 
Not available 
Adenocarcinoma 



Breast 
Breast 

Breast 
Breast 
Breast 
Breast 
Breast 
Breast 
Breast 
Breast 
Breast 
Breast 
Breast 
Breast 
Breast 
Breast 
Breast 
Breast 



Medullary, infiltrating ductal 

Infiltrating ductal 
Ductal carcinoma in situ 
Infiltrating ductal 
Infiltrating ductal 



Adenocarcinoma 
Basal, infiltrating ductal 

Infiltrating ductal 
Ductal carcinoma in situ 
Not available 

Ductal carcinoma in situ 
Invasive ductal carcinoma 
Invasive ductal carcinoma 
Invasive ductal carcinoma 
Invasive ductal carcinoma 
Invasive lobular carcinoma 
Invasive ductal carcinoma 
Invasive ductal carcinoma 
Invasive ductal carcinoma 
Invasive ductal carcinoma 
Invasive ductal carcinoma 
Ductal carcinoma in situ 
Ductal carcinoma in situ 
Invasive ductal carcinoma 
Invasive ductal carcinoma 
Invasive ductal carcinoma 



42,973,730 
40,158,059 
46,240,754 
23,418,595 
40,313,161 
17,496,012 
36,166,319 
27,830,687 

33,419,098 
27,261,117 
40,973,473 
29,561,523 
25,790,969 
37,657,589 
17,433,912 
35,977,512 

33,662,978 
29,648,460 
53,411,156 
31,736,845 
35,014,538 
38,418,769 

17,832,681 
36,166,319 
50,944,516 
43,889,986 
40,125,408 
31,798,628 
49,739,41 5 
63,352,269 
43,744,840 
43,573,311 
40,938,838 
36,258,870 
34,550,745 
50,295,200 
60,736,566 
57,383,360 



4,340,346,730 
4,055,963,959 
4,670,316,154 
2,365,278,095 
4,071,629,261 
1,767,097,212 
3,652,798,219 
2,810,899,387 

3,375,328,898 
2,753,372,817 
4,138,320,773 
2,985,713,823 
2,604,887,869 
3,803,416,489 
1,760,825,112 
3,633,728,712 

3,399,960,778 
2,994,494,460 
5,394,526,756 
3,205,421,345 
3,536,468,338 
3,880,295,669 

1,801,100,781 
3,652,798,219 
5,145,396,116 
4,432,888,586 
4,052,666,208 
3,211,661,428 
5,023,680,915 
6,398,579,169 
4,41 8,228,840 
4,400,904,41 1 
4,134,822,638 
3,662,145,870 
3,489,625,245 
5,079,815,200 
6,134,393,166 
5,795,719,360 



97.6 
98.3 
97.2 
98.1 
98.0 
97.9 
99.5 
96.3 

92.9 
92.4 
99.6 
91.5 
93.1 
91.6 
91.6 
97.3 

93.2 
98.3 
98.8 
98.3 
98.4 
97.5 

93.1 
99.5 
98.4 
99.6 
99.5 
97.5 
99.6 
99.6 
99.5 
99.6 
99.3 
99.6 
99.4 
99.5 
99.7 
99.6 



70 
65 
75 
38 
66 
28 
59 
45 

54 
44 
67 
48 
42 
61 
28 
59 

55 
48 
87 
52 
5/ 
62 

29 

59 

83 

71 

65 

52 

81 

103 

71 

71 

67 

59 

56 

82 

99 

93 



184,865 
152,692 
176,554 
207,103 
213,347 
183,741 
171,425 
1 04,343 

113,079 
115,328 
127,272 
108,655 
84,687 
139,891 
131,786 
128,680 

129,754 
1 98,862 
193,017 
130,941 
129,754 
161,953 

109,864 
142,155 
152,125 
169,633 
153,511 
119,875 
113,058 
99,732 
149,873 
141,236 
143,262 
138,018 
146,858 
1 56,666 
115,909 
120,945 
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Table 1 BRCAx familial breast cancer cases used in the study (Continued) 



17 


Breast 


Invasive ductal carcinoma 




44,922,61 1 


4,537,183,711 


99.6 


73 


110,503 


18 


Breast 


Invasive ductal carcinoma 




33,883,509 


3,422,234,409 


99.4 


55 


131,955 


19 


Breast 


Invasive ductal carcinoma 




49,729,619 


5,022,691,519 


99.5 


81 


146,665 


20 


Breast 


Invasive ductal carcinoma 




63,184,143 


6,381,598,443 


99.6 


103 


119,680 


21 


Breast 


Invasive ductal carcinoma 




28,002,381 


2,828,240,481 


99.6 


46 


86,924 


22 


Breast 


Invasive ductal carcinoma 




47,794,798 


4,827,274,598 


99.5 


78 


112,030 


Average 


38,941,211 


3,933,062,277 


97.7 


63 


140,187 









synonymous variants that affect protein coding, splicing, 
and stop gain- or loss mutations, which are reliably detect- 
able by exome analysis [21]. The average exome coverage 
was 63x, and the average number of variants called was 
140,187 per case (Table 1). 

To increase the likelihood that the variants identified 
in the breast cancer-affected family members are breast 
cancer-associated, variants in each data set were filtered 
by: 1) removal of common variants present in human 
populations. All variants matching to population-derived 
variant databases (i.e., dbSNP137, ESP6500, and 1000 
genomes) were removed; 2) Removal of family-specific 
normal variants. For the three families in the study, the 
variants shared between the affected and the unaffected 
members in the same family were removed. To identify 
those causing deleterious effects in the affected genes, the 
remaining variants were analyzed using the Polyphen-2 
Program [18]. A total of 337 novel, deleterious variants 
present only in the affected members of Families I, II, and 
III were identified at an average of 112 variants per family 
(Table 2, Additional files 1: Table S1A, B, C); 689 novel, 
deleterious variants were identified in the 22 probands at 
an average of 30 variants per proband (Table 2, Additional 
files 2: Table S2A, B). Sanger sequencing validated the 
mapped variants at a validation rate of 83% (53/64), 
highlighting the reliability of the variants identified by 
exome mapping analysis (Additional file 1: Table SID). 

Novel deleterious variants are mostly family-specific 

We compared the variants within each family. We ob- 
served that 25% of the variants on average (14% in 
Family I, 29% in Family II, 35% in Family III) were 
shared in multiple affected members in each family, 
whereas 75% on average (86% in Family I, 71% in Family II 
and 65% in Family III) were present only in single affected 
member in each family (Table 2). We then compared 
the shared variants between the three families, and 
found only 1 variant was shared between Family I and 
Family II, four variants were shared between Family I 
and Family III (Figure 2A). For the 689 variants identi- 
fied in the probands, 82% were proband-specific, and 
only 18% were shared between probands at various fre- 
quencies (Figure 2B, Additional file 2: Table S2A, S2B). 



The results indicate that the majority of the novel, dele- 
terious variants identified in the three families and pro- 
bands are family-specific, i.e., present only in each family 
but not shared with other families. 

Identification of putative genetic predispositions 

We analyzed the shared mutations between the affected 
members of the same family, the functional class of the 
mutated genes, and existing evidence for their contribu- 
tion to cancer. In doing so, we identified the variants as 
the putative predispositions in Family I, II, and III, and 
probands (Table 3, Additional file 1: Table S1A, SIB, SIC). 
For Family I, this was the PTEN-Induced Putative Kinase 1 
(PINK1); for Family II, these were Lysine (K) Acetyltransfer- 
ase 6B (KAT6B) and Neurogenic Locus Notch Homolog 
Protein 2 (NOTCH2); and for Family III, this was Phos- 
phorylase Kinase Beta (PHKB). 

PINK1 is a mitochondrial serine/threonine-protein 
kinase. Mutation in PINK1 causes autosomal recessive 
Parkinson's disease [22]. KAT6B is a histone acetyl trans- 
ferase involved in DNA replication, gene expression and 
regulation, and epigenetic modification of chromosomal 
structure [23]. Mutations in KAT6B cause multiple neuro- 
logical diseases [24]. NOTCH2 is a member of the Notch 
family involved in controlling cell fate decision. Low 
Notch activity leads to hyperproliferative activity in breast 
cancer [25] and mutation in NOTCH2 causes Hajdu- 
Cheney syndrome [26]. PHKB regulates the function of 
phosphorylase kinase [27]. Mutation in PHKB causes 
glycogen storage disease type 9B [28]. Interestingly, a vari- 
ant in Polymerase (DNA-Directed) Kappa (POLK) was 
present in Family I member #4. POLK is a member of Y 
family DNA polymerases, and functions by repairing 
the replication fork passing through DNA lesions [29]. 
Although we are not able to validate it due to the lack 
of DNA from the subject's parents, it raises a possibility 
that this variant could be a de novo mutation in this in- 
dividual. Multiple transcriptional factors were also affected 
by the mutations in each family. For example, the following 
transcriptional factors were mutated in Family I: ZNF335, 
LRRC66, ZNF417, ZNFS87, GTF2I, ZFAND4, EIF4G2, 
GZF1, CCDC86, ZSCAN18, ZNF546, TAF1L, and LRIG3 
(Additional file 1: Table S1A). 
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Table 2 Novel, deleterious variants detected in breast 
cancer-affected cases* 



Family Total (%) Individual (%) Shared**(%) 



Family 1 








1 


37 


35 


2 


2 


26 


26 


0 


3 


25 


15 


10 


4 


48 


39 


9 


6 


29 


1/ 


12 


/ 


12 


6 


6 


8 


14 


6 


8 


Subtotal 


143 (199) 


1 23 (86) 


20 (14) 


Family 2 








1 


22 


13 


9 


2 


15 


5 


10 


3 


21 


9 


12 


4 


21 


12 


9 


5 


16 


8 


8 


6 


8 


2 


6 


Subtotal 


66 (100) 


47 (71) 


19 (29) 


Family 3 








1 


39 


13 


26 


2 


--■18 


27 


21 


4 


21 


12 


9 


5 


32 


12 


20 


6 


41 


19 


22 


Subtotal 


128 (100) 


83 (65) 


45 (35) 


Total 


337 (100) 


253 (75) 


84 (25) 


Probands 


1 


35 


10 


25 


2 


58 


22 


36 


3 


74 


28 


46 


4 


// 


49 


28 


5 


70 


28 


42 


6 


41 


16 


25 


/ 


31 


24 


/ 


8 


43 


27 


16 


9 


51 


19 


32 


10 


61 


30 


31 


11 


70 


35 


35 


12 


51 


12 


39 


13 


55 


15 


40 


14 


60 


30 


30 


15 


51 


31 


20 


16 


41 


31 


10 


17 


32 


18 


14 



Table 2 Novel, deleterious variants detected in breast 
cancer-affected cases* (Continued) 



18 


5/ 


25 


32 


19 


58 


18 


40 


20 


47 


23 


24 


21 


33 


25 


8 


22 


34 


22 


12 


Total 


689 (100) 


568 (82) 


121 (1 


Per proband 


30 


26 


6 



*The counts in subtotal and total are the unique number of variants. 
**Shared with family members in the families, or shared with other probands. 



The variant data from probands show similar patterns 
as those of the three families (Table 3). In the 22 probands, 
four carried variants affecting the genes involved in DNA 
replication and damaging repair. Those include Polymer- 
ase (DNA-directed) Theta (POLQ) in Proband #2, RAD23 
Homolog B (S. cerevisiae) (RAD23B) in Proband #3, Ligase 




Figure 2 Comparison of the variants in BRCAx families and 
probands. A. Comparison in the three families. B. Comparison in 
the probands. The results show that the variants detected in the 
cancer-affected family members are highly family-specific. The higher 
rate (18%) of the shared variants in the probands are likely due to 
the remaining normal variants not filtered in the probands and the 
larger number of families represented by the probands than the 
three families. 
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Table 3 Putative predispositions in familial breast cancer families and probands 



Gene Description 



Position 



Nucleotide Amino Type PolyPhen2* Cancer-affected Frequency 
acid member 



Score 
prediction 



Family 1 



GPRIN1 

PINK1 
POLK 



G protein regulated 
inducer of neurite 
outgrowth 1 

PTEN induced 
putative kinase 1 

Polymerase (DNA 
directed) kappa 



Family 2 



KAT6B 
KAT6B 
N0TCH2 Notch 2 



K(lysine) 

acetyltransferase 6B 
K(lysine) 

acetyltransferase 6B 



Family 3 

NANP 
PHKB 

Proband 

1 JAKMIP3 



POLQ 

DUX2 
UBE2L3 

RAD23B 

GATA3 
KAT6B 

LIG1 

LIG4 

NOTCH2 
ABL1 



N-acetylneuraminic 
acid phosphatase 

phosphorylase 
kinase, beta 



3 
4 

5 

/ 

8 

9 
10 

14 

15 

16 TNK2 

17 NFRKB 



It 



NFKBIZ 



Janus kinase and 
microtubule 
interacting protein 3 

Polymerase (DNA 
directed), theta 

Double homeobox 2 

Ubiquitin-conjugating 
enzyme E2L 3 

RAD23 homolog B 
(S. cerevisiae) 

GATA binding protein 3 

K(lysine) 

acetyltransferase 6B 

Ligase I, DNA, 
ATP-dependent 

Ligase IV, DNA, 
ATP-dependent 

Notch 2 

c-abl oncogene 1, 
non-receptor tyrosine 
kinase 

Tyrosine kinase, 
non-receptor, 2 

Nuclear factor related to 
kappaB binding protein 

Nuclear factor of kappa 
light polypeptide gene 
enhancer 



chr5:1 760261 23 C.T713G p.L238S Exonic 0.91 D 

chr1:20972051 c.960-2A>G Splicing NA NA 

chr5:74892737 C.A2219G p.H740R Exonic 0.62 P 

chrl 0:767891 28 C.G4546T p.D1516Y Exonic 0.95 D 

chrl 0:76789311 C.C4729T p.R1577C Exonic 0.96 D 

chrl :1 204591 67 C.C6178T p.R2060G Exonic 0.99 D 

chr20:25596725 C.A583G p.!195V Exonic 0.98 D 



chrl 6:476281 26 C.1204 + 
1G>T 



Splicing NA NA 



chrl 0:1 33955524 C.G1574C p.G525A Exonic 1.00 D 

chr3:121207798 C.A3980G p.Q1327P Exonic 1.00 D 

chrl 0:1 35494906 Splicing NA NA 

chr22:21 975938 C.G349A p.E117K Exonic 0.96 D 

chr9:1 10087260 C.C1028T p.P343L Exonic 0.99 D 

chrl 0:81 00630 C.C604T p.R202G Exonic 0.92 D 

chrl 0:76744854 C.G2390A p.S797N Exonic 0.98 D 

chrl 9:48637322 C.G1525A p.E509K Exonic 0.95 D 

chrl 3:1 08862463 C.G1154A p.R385K Exonic 1.00 D 

chrl :1 20529603 C.G854A p.R285H Exonic 1.00 D 

chr9:1 33729493 C.G122A p.G41D Exonic 0.92 D 

chr3:1 95596385 C.C1760T p.P587L Exonic 1.00 D 

chrl 1:1 29755398 C.G611A p.R204H Exonic 1.00 D 



chr3:1 01 576029 



Splicing NA NA 



12 3 4 6 7 
- + + + + + 



1 2 3 4 5 6 

- + + + + + 

- + + + + + 

--+-- + 
1 2 4 5 6 
+ - + - - 
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Table 3 Putative predispositions in familial breast cancer families and probands (Continued) 



19 SMG1 SMG1 phosphatidylinositol chrl 6:1 8879624 C.C3083T 

3-kinase-related kinase 

20 PRKCQ Protein kinase C, theta chrl 0:6528042 C.G855C 

21 ADRA2A Adrenoceptor alpha 2A chrl 0:1 128381 1 7 C.C363G 

22 PPFIA4 Protein tyrosine chrl 203025582 C.C668T 

phosphatase, receptor 
type 

D: Probably damaging (score: 0.909-1); P: Possibly damaging (score: 0.447 - 0.908). 

I DNA, ATP-dependent (LIG1) in Proband #9, and 
Ligase IV DNA, ATP-dependent (LIG4) in Proband #10. 
POLQ repairs the apurinic sites [30] . R AD23B plays a role 
in nucleotide excision repair [31]. LIG1 ligates nascent 
DNA of the lagging strand, and a mutation in LIG1 causes 
replication errors, genome instability, and cancer [32]. 
LIG4 catalyzes double-strand break repair by joining 
non-homologous ends, and mutation in LIG4 causes 
LIG4 syndrome [33]. Several variants are found in well- 
known oncogenes and tumor suppressor genes, such as 
GATA Binding Protein 3 (GATA3) in Proband #7 and 
Abelson Murine Leukemia Viral Oncogene Homolog 1 
(ABL1) in Proband #18. GATA3 regulates luminal epithe- 
lial cell differentiation in the mammary gland [34,35]. The 
abnormal expression of GATA3 causes luminal A-type 
breast cancer [36-38]. ABL1 is a tyrosine kinase that 
controls cell differentiation and division. It is involved 
in (9, 22) translocation, forming BCR-ABL fusion gene 
in chronic myelogenous leukemia (CML) [39]. Several 
individual variants in different cases affect the same genes 
but at different positions. For example, in Proband #8, a 
variant in KAT6B (c.G1841A/p.S614N) affects the HAT do- 
main at the N terminal, whereas two variants in KAT6B in 
Family II (c.G3997T/p.D1333Y and c.C4180T/p.R1394C) 
affect the Met-rich domain at the C-terminal. In Proband 
#14 and Family II, two different NOTCH2 variants (c. 
G854A/p.R285H, c.C6178T/p.R2060C) were present. 
Multiple variants affect the genes involved in phosphoryl- 
ation. These include Tyrosine Kinase Non-Receptor 2 
(TNK2) in proband #16, Phosphatidylinositol 3 Kinase- 
Related Kinase (SMG1) in Proband #19, Protein Kinase C 
Theta (PRKCQ) in Proband #20, and Protein Tyrosine 
Phosphatase, Receptor Type F (PPFIZ4) in Proband #22. 

We also performed an analysis at the pathway level by 
annotating the mutation-affected genes in the three fam- 
ilies using KEGG database (http://www.genome.jp/kegg/ 
pathway.html). Certain mutations were identified to affect 
several functional pathways. For example, the genes 
mutated in Family I (ACADVL, AHCY, ALDOA, SGPL1, 
MAT1A, GALNT8, GGT1) are involved in metabolic 
pathways. The genes mutated in Family 2 (NOTCH2, 
DUSP16) are involved in Notch signaling pathway and 
MAPK signaling pathway; genes mutated in Family III 



p.T1028M Exonic 0.99 D 

p.Q285H Exonic 1.00 D 

p.C121W Exonic 1.00 D 

p.T223M Exonic 0.92 D 



(SLC9A1, ITGAX, ITGAD) are involved in regulation of 
actin cytoskeleton. 

Discussion 

The majority of families with familial breast cancer lack 
evidence for their genetic predispositions. Efforts in past 
decade have made slow progress in determining the un- 
known genetic predispositions. Currently, population- 
based approach is adapted as the major promising tool 
to reach the goal [40]. One weakness of this approach is 
that it can "dilute out the effects of a very strong associ- 
ation in a small subset of the study population" [41]. It 
requires a large-size disease population of over tens of 
thousands but the predispositions identified will likely 
remain very rare in the disease population. Due to the 
extreme rarity, such genetic predispositions are often 
difficult to confirm in different disease populations and 
to distinguish from normal polymorphisms [5,10]. Our 
study observed the presence of family-specific, novel, 
deleterious variants, and putative predispositions in the 
families and probands analyzed. The information im- 
plies that, in addition to the population-based approach, 
a family-based approach provides another option to de- 
termine the genetic predisposition. 

Based on the higher frequencies of well-known predispo- 
sitions identified by traditional approaches, the rarity of the 
predispositions recentiy identified by population-based ap- 
proach, and the presence of family-specific, novel, deleteri- 
ous variants in disease families revealed in our study, we 
propose a model to explain the genetic predispositions in 
familial breast cancer (Figure 3). In this model, the predis- 
position in BRCA1 has the highest frequency in the famil- 
ial breast cancer population, other known predispositions 
gradually decrease their frequencies to insignificant levels, 
and the predispositions for many BRCAx familial breast 
cancers are family-specific. The model explains the diffi- 
culty in using traditional and population-based approaches 
to determine the unknown predispositions, and highlights 
that applying family-focused approach will be able to de- 
termine the genetic predispositions for many BRCAx dis- 
ease families. This model can be further tested in larger 
number of BRCAx familial breast cancer families. 
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Figure 3 A model for the genetic predispositions in familial breast cancer. The known predisposition in BRCA1 has the highest sharing 
frequency in the disease population, other known predispositions decrease their frequencies towards extreme rarity in the disease populations, 
and the family-specific predispositions are enriched in many disease families without known predispositions. The biggest circle represents the 
entire genetic predispositions in familial breast cancer. The open circles represent the shared, known predispositions, and the black circles 
represent the family-specific predispositions. 



Our study aimed to determine if there are germline mu- 
tations present, rather than reach for comprehensive cover- 
age of germline mutations in each family. We achieved this 
by eliminating all variants matched in population-derived 
variant databases (i.e., dbSNP137, ESP6500, 1000 genomes) 
to maximally avoid the variants representing normal poly- 
morphism. Inclusion of such variants as the predisposition 
candidates, even with the use of certain cut-off such as 
minor allele frequency (MAF) <0.01, can increase the sensi- 
tivity but decrease the specificity of the variants referred to 
as putative predispositions. 

Assignment of a specific mutation as a true predispos- 
ition to a disease family requires solid phenotypic evidence 
from in vitro analysis, cell line tests, search of the literature, 
bioinformatics data analysis, and animal models. This is 
best evidenced by determining the BRCA1 germline muta- 
tions as genetic predispositions in breast cancer, in which 
the definitive conclusion for its contribution to breast can- 
cer is based on the mouse models showing development of 
breast cancer with the germline mutated BRCA1 [42] . Our 
current study aims to provide evidence that the BRCAx 
disease families are enriched with germline damaging mu- 
tations, such that focusing on each disease family will be 



required to determine the genetic predisposition in each 
family. Indeed, even under strict mapping conditions, 
large numbers of mutations have been detected in each 
disease family and probands. While the data provide rich 
resources to identify the true predisposition for the disease 
family, the data cannot be considered as true predispos- 
ition without further phenotypic and functional evidences. 

Conclusions 

Our study shows that genetic predispositions in many 
BRCAx familial breast cancer families can be family-specific. 
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Additional file 1: Table SI. Variants detected in breast cancer-affected 
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Table SI B. Family 2; Table SIC. Family 3; Table SI D. Variants shared among 

the three families; Table S1E. Variants validated by Sanger sequencing. 
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BRCAx: Familial breast cancer without known mutations in BRCA1 and BRCA2; 
Proband: the first affected family member seeking medical attention; Exome 
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sequencing: Sequencing the entire coding region in a genome using the 
next generation DNA sequencing technology; SAM: Sequence Alignment/ 
Map format used for storing sequence data in a series of tab delimited ASCII 
columns; BAM: A binary format for storing sequence data in a compressed, 
indexed, binary form; GATK: Genome Analysis Toolkit. It is a software 
package to analyse next-generation resequencing data; VarScan 2: a software 
package to detect variants in next-generation resequencing data; 
PolyPhen-2: a software to predict possible impact of an amino acid 
substitution on the structure and function of a protein; Primer3: a software 
for designing PCR primers; NCBI: The National Center for Biotechnology 
Information; dbSNP: Single Nucleotide Polymorphism Database; ESP: Exome 
Sequencing Project; MAF: Minor Allele Frequency. 
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